36 research outputs found

    Neural Based Statement Classification for Biased Language

    Full text link
    Biased language commonly occurs around topics which are of controversial nature, thus, stirring disagreement between the different involved parties of a discussion. This is due to the fact that for language and its use, specifically, the understanding and use of phrases, the stances are cohesive within the particular groups. However, such cohesiveness does not hold across groups. In collaborative environments or environments where impartial language is desired (e.g. Wikipedia, news media), statements and the language therein should represent equally the involved parties and be neutrally phrased. Biased language is introduced through the presence of inflammatory words or phrases, or statements that may be incorrect or one-sided, thus violating such consensus. In this work, we focus on the specific case of phrasing bias, which may be introduced through specific inflammatory words or phrases in a statement. For this purpose, we propose an approach that relies on a recurrent neural networks in order to capture the inter-dependencies between words in a phrase that introduced bias. We perform a thorough experimental evaluation, where we show the advantages of a neural based approach over competitors that rely on word lexicons and other hand-crafted features in detecting biased language. We are able to distinguish biased statements with a precision of P=0.92, thus significantly outperforming baseline models with an improvement of over 30%. Finally, we release the largest corpus of statements annotated for biased language.Comment: The Twelfth ACM International Conference on Web Search and Data Mining, February 11--15, 2019, Melbourne, VIC, Australi

    CAUSES AND CONSEQUENCES OF DOMESTIC VIOLENCE, SOCIO-CULTURAL DIFFERENCES IN KOSOVO

    Get PDF
    In this paper we are trying to identify causes and consequences of domestic violence in Kosovo. As one of the country in which society is undergoing through the radical transition, Kosovo is faced with different challenges in order to build a state where social and political rights are equal for everyone without taking into consideration genders differences, ages, religion, race, political orientation, language, etc. So far, there have been taken a number of legal responsibilities dealing with domestic violence. Under the pressure of the European Integration, Kosovo has approved the national program against domestic violence, law on the family, Law on Protection from Domestic Violence, different strategy with the international support, and also have an active role of nongovernmental organizations in advocating the gender-based equalities. Domestic violence as a social phenomenon is deeply elaborated by different social scholars as an act that violates human rights and that all human beings are free and with equal rights and dignity. In this paper we will discuss the official data related to domestic violence in Kosovo, going through the cases from deaths, suicide, to child abuse, disturbance, disagreement and different variables. Also, we will explain how is defined the domestic violence in Kosovo, from the dimension of physical abuse to the economic abuse. The main part of this article is analyzing official data from the studies, safety agencies such as: police and justice, and also nongovernmental organizations related to this issue. The aim of the state institutions is to prevent domestic violence, but how is the real situation in the field? Do they protect and secure the victims? Do they offer training and reintegration of the victims? However those data bring us into line with the real situation of domestic violence in Kosovo, regardless of the different perceptions of this phenomenon in our society

    Improving Entity Retrieval on Structured Data

    Full text link
    The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the \emph{x--means} and \emph{spectral} clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge (BTC12) dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches

    Approaches for enriching and improving textual knowledge bases

    Get PDF
    [no abstract

    InstructPTS: Instruction-Tuning LLMs for Product Title Summarization

    Full text link
    E-commerce product catalogs contain billions of items. Most products have lengthy titles, as sellers pack them with product attributes to improve retrieval, and highlight key product aspects. This results in a gap between such unnatural products titles, and how customers refer to them. It also limits how e-commerce stores can use these seller-provided titles for recommendation, QA, or review summarization. Inspired by recent work on instruction-tuned LLMs, we present InstructPTS, a controllable approach for the task of Product Title Summarization (PTS). Trained using a novel instruction fine-tuning strategy, our approach is able to summarize product titles according to various criteria (e.g. number of words in a summary, inclusion of specific phrases, etc.). Extensive evaluation on a real-world e-commerce catalog shows that compared to simple fine-tuning of LLMs, our proposed approach can generate more accurate product name summaries, with an improvement of over 14 and 8 BLEU and ROUGE points, respectively.Comment: Accepted by EMNLP 2023 (Industry Track

    Answering Unanswered Questions through Semantic Reformulations in Spoken QA

    Full text link
    Spoken Question Answering (QA) is a key feature of voice assistants, usually backed by multiple QA systems. Users ask questions via spontaneous speech which can contain disfluencies, errors, and informal syntax or phrasing. This is a major challenge in QA, causing unanswered questions or irrelevant answers, and leading to bad user experiences. We analyze failed QA requests to identify core challenges: lexical gaps, proposition types, complex syntactic structure, and high specificity. We propose a Semantic Question Reformulation (SURF) model offering three linguistically-grounded operations (repair, syntactic reshaping, generalization) to rewrite questions to facilitate answering. Offline evaluation on 1M unanswered questions from a leading voice assistant shows that SURF significantly improves answer rates: up to 24% of previously unanswered questions obtain relevant answers (75%). Live deployment shows positive impact for millions of customers with unanswered questions; explicit relevance feedback shows high user satisfaction.Comment: Accepted by ACL 2023 Industry Trac
    corecore